Hosted clone-free portfolio report (free tier)#102
Merged
Conversation
Add an API-only scoring path so an arbitrary public GitHub user can be scored without cloning any repository — the engine behind the hosted 'paste your username' report. - github_client: get_repo_tree (Git Trees API, recursive) + get_file_content (Contents API, base64); fail-soft with logging on unexpected statuses and a tree-truncation signal. - api_checkout: materialize a sparse on-disk skeleton from the tree (dirs + presence files + curated file content), path-traversal + null-byte guarded; drop-in replacement for cloner.clone_workspace. - api_only: score_repos_api_only / audit_user_api_only run the existing, unmodified 13-analyzer engine against the skeleton. Interactive 'fast' mode skips slow async stats endpoints (~10x faster on live scans). OSS CLI and analyzers unchanged. New unit tests cover the client methods, materializer, and orchestrator; live-verified on a public user.
Add GET /api/report/{username} wrapping audit_user_api_only(fast=True),
returning ApiOnlyReport.to_dict() JSON. Plain-def route offloads the
blocking, network-bound scan to FastAPI's threadpool; GitHubClient is
injected via a dependency for test/deploy override and server-side token.
- Validates the username via the existing validate_username gate (422).
- Maps GitHub errors: 404 not-found, 429 rate-limit (429 or 403 w/ zero
quota), 403 forbidden, 502 for other HTTP/network/client errors.
- Clamps repos scored to MAX_REPOS_CAP to bound public cost.
- 14 endpoint tests; wired into the serve app factory.
Add CORSMiddleware to the app factory with origins resolved from GHRA_CORS_ORIGINS (defaults to the local Next.js dev server). GET-only, no credentials — the report endpoint is public and unauthenticated.
Add a Next.js 15 / React 19 App Router app under web/ that powers the free hosted report: a username form does a client-side fetch to the FastAPI /api/report endpoint and renders the result with a top-fixes framing — grades, repo health, flags, and the engine's ranked action candidates as the hero of each card. Repos sort worst-health-first. - lib/api.ts: typed fetch client with per-status messages + a boundary shape guard; lib/url.ts: https-only href allowlist (XSS guard). - ReportExplorer: client-side idle/loading/done/error state machine with ARIA live regions; ReportView + RepoCard are presentational. - Dark editorial design, color-coded grade chips, mono accents. - Typechecks clean, production build passes, visually verified end to end against a live GitHub user (8 cards rendered).
Add a pluggable hosting layer so the free report endpoint survives a second visitor: an in-memory KV store (thread-safe, lazy-expiring, with size-triggered reaping) backs a report cache (1h TTL default) and a fixed-window per-IP rate limiter (20/hr default). A Redis/Upstash backend drops in via GHRA_REDIS_URL for multi-instance deploys. Endpoint flow is now throttle -> validate -> cache get/hit -> scan -> cache put. Cache verified live: cold scan 6.3s, warm hit 1.5ms. Hardening from review: - Redis incr uses plain EXPIRE on first hit (no NX) — works on all server versions, never extends the window. - X-Forwarded-For honored only when GHRA_TRUST_FORWARDED_FOR is set (default off) — XFF is spoofable, so default keys on the direct peer. - Counter/value stores reap expired entries past a threshold to bound memory under unique-IP churn. 99 tests pass; ruff + mypy clean.
Load Space Grotesk (display/body) and JetBrains Mono (code/labels) via next/font, replacing the system-ui stack with intentional, self-hosted faces. Wired through the --font-sans / --font-mono CSS variables.
The report cache keys on username alone, but the endpoint accepted a max_repos query param — a report cached for one cap could be served to a request expecting another. Remove the public knob (MAX_REPOS_CAP already bounds cost); the scan is always capped server-side, so a username fully determines its cached report.
When the client has a token, list a user's repos via the existing bulk_fetch_repos GraphQL query — one paginated call that also returns per-repo language byte breakdowns, so metadata.languages is now populated (REST left it empty). Falls back to REST list_repos when unauthenticated, when GraphQL returns no user (clean 404), or on any GraphQL error. Live-verified on octocat: language breakdowns populate, grades stable.
Step 4 — the 'earn the tier' instrumentation: Backend: - POST /api/waitlist captures emails (Pydantic-validated, throttled on a separate per-IP bucket so browsing reports never blocks signup). - SqliteWaitlistStore: durable, dedup on lowercased email, thread-safe via lock + contextlib.closing per connection. Path from GHRA_WAITLIST_DB or under the app output dir. Frontend: - Shareable /u/[username] route: the form now routes there (via useTransition) instead of fetching in place, so every report has a URL. - ReportLoader fetches client-side with an AbortController that cancels the orphaned scan on unmount/username change. - WaitlistForm email capture on the report, with idle/submitting/done/error states; CORS now allows POST. Verified live end to end: home → submit → /u/octocat → 8 cards → waitlist signup persists with source attribution. 131 backend tests pass; web typechecks + builds; reviewer findings from both stacks addressed.
Cold-scan latency was the gap (a 30-repo user took 60-90s, prolific users timed out). Two changes: - score_repos_api_only now materializes + analyzes + scores each repo concurrently (ThreadPoolExecutor, 8 workers). Safe on the hosted path: it is authenticated (ample rate limit), uses no shared response/analyzer cache, each repo writes its own temp subdir, and requests.Session is thread-safe. Scores are byte-identical to the sequential run. - _select_repos ranks original active work (non-fork, non-archived) by recency then stars and takes the top N, so a prolific account's report showcases their best/current repos instead of an arbitrary slice. Cap lowered 30 -> 20. Live: octocat 19.6s -> 6.3s; tiangolo (hundreds of repos) 90s+ timeout -> 18.7s scanning his top-20, all non-fork high-star repos.
Make the hosted report deployable: - Dockerfile (uv, frozen lockfile, serve+hosting extras) running uvicorn with --forwarded-allow-ips so the per-IP throttle sees real client IPs behind a proxy. Built + ran the image: /api/health returns 200. - fly.toml: health check on /api/health, /data volume for the waitlist DB, non-secret config; secrets via fly secrets set. - .dockerignore trims the context to pyproject/uv.lock + src. - DEPLOY.md: full runbook (Fly API, Vercel frontend, Upstash Redis, env reference, Postgres follow-up). - Add GET /api/health (reports token presence) for platform probes. - uv.lock now includes the redis (hosting) extra. Fix found via the container run: SqliteWaitlistStore now creates its parent dir so it works on a fresh host before the volume path is populated.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9dc63c7e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a free, hosted "paste a GitHub username → portfolio health report" on top of the existing auditor engine. The OSS CLI is untouched; this is additive.
What's here
Engine (clone-free scoring)
HTTP API (FastAPI,
src/serve)GET /api/report/{username}-> JSON report;POST /api/waitlist-> email capture;GET /api/health.Frontend (
web/, Next.js 15 + React 19)/u/{username}route -> report with a "top fixes" framing.web/pnpm-workspace.yamlnow explicitly approves the requiredsharpbuild script so pnpm/Vercel installs are not blocked.Deploy
Dockerfile+fly.tomlandDEPLOY.mdcover the Fly API, Vercel frontend, Upstash/Redis optional cache, and env reference.https://ghra-report-web.vercel.app.fly apps create ghra-report-api --yesreturns a payment-information-required error before app creation.Verification - 2026-06-20
uv run --extra serve --extra hosting ruff check src/ tests/: pass.uv run --extra serve --extra hosting python -m pytest tests/test_api_checkout.py tests/test_api_only.py tests/test_serve.py tests/test_serve_api.py tests/test_serve_hosting.py tests/test_serve_waitlist.py tests/test_github_client.py -q -p no:cacheprovider: 166 passed, 1 warning.uv run --extra serve --extra hosting --extra semantic python -m pytest -q -p no:cacheprovider: 2584 passed, 2 skipped, 2 warnings.pnpm typecheck: pass aftersharpapproval.pnpm build: pass locally and on Vercel production./api/healthreportsgithub_token: true;/api/report/octocatreturns HTTP 200,mode=api_only, 8 repos;/api/waitlistis idempotent./u/octocatand invalid username error state.origin/mainfound no conflict markers; GitHub reports this PR mergeable.Remaining operator checklist
saagarFly organization.fly apps create ghra-report-api --yesfly volumes create ghra_data --region iad --size 1 --app ghra-report-api --yesGHRA_GITHUB_TOKEN,GHRA_CORS_ORIGINS=https://ghra-report-web.vercel.app, optionalGHRA_REDIS_URLfly deployhttps://ghra-report-api.fly.dev/api/healthreturnsgithub_token: true.https://ghra-report-web.vercel.app/u/octocatrenders a report, not the graceful service-unreachable state.Notes